PLS-Based and Regularization-Based Methods for the Selection of Relevant Variables in Non-targeted Metabolomics Data
نویسندگان
چکیده
Non-targeted metabolomics constitutes a part of the systems biology and aims at determining numerous metabolites in complex biological samples. Datasets obtained in the non-targeted metabolomics studies are high-dimensional due to sensitivity of mass spectrometry-based detection methods as well as complexity of biological matrices. Therefore, a proper selection of variables which contribute into group classification is a crucial step, especially in metabolomics studies which are focused on searching for disease biomarker candidates. In the present study, three different statistical approaches were tested using two metabolomics datasets (RH and PH study). The orthogonal projections to latent structures-discriminant analysis (OPLS-DA) without and with multiple testing correction as well as the least absolute shrinkage and selection operator (LASSO) with bootstrapping, were tested and compared. For the RH study, OPLS-DA model built without multiple testing correction selected 46 and 218 variables based on the VIP criteria using Pareto and UV scaling, respectively. For the PH study, 217 and 320 variables were selected based on the VIP criteria using Pareto and UV scaling, respectively. In the RH study, OPLS-DA model built after correcting for multiple testing, selected 4 and 19 variables as in terms of Pareto and UV scaling, respectively. For the PH study, 14 and 18 variables were selected based on the VIP criteria in terms of Pareto and UV scaling, respectively. In the RH and PH study, the LASSO selected 14 and 4 variables with reproducibility between 99.3 and 100%, respectively. In the light of PLS-based models, the larger the search space the higher the probability of developing models that fit the training data well with simultaneous poor predictive performance on the validation set. The LASSO offers potential improvements over standard linear regression due to the presence of the constrain, which promotes sparse solutions. This paper is the first one to date utilizing the LASSO penalized logistic regression in untargeted metabolomics studies.
منابع مشابه
Serum-based metabolic alterations in patients with papillary thyroid carcinoma unveiled by non-targeted 1H-NMR metabolomics approach
Objective(s): As the most prevalent endocrine system malignancy, papillary thyroid carcinoma had a very fast rising incidence in recent years for unknown reasons besides the fact that the current methods in thyroid cancer diagnosis still hold some limitations. Therefore, the aim of this study was to improve the potential molecular markers for diagnosis of benign and malignant thyroid nodules to...
متن کاملA Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset
Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...
متن کاملComparative Approach to the Backward Elimination and for-ward Selection Methods in Modeling the Systematic Risk Based on the ARFIMA-FIGARCH Model
The present study aims to model systematic risk using financial and accounting variables. Accordingly, the data for 174 companies in Tehran Stock Exchange are extracted for the period of 2006 to 2016. First, the systematic risk index is estimated using the ARFIMA-FIGARCH model. Then, based on the research background, 35 affective financial and accounting variables are simultaneously used with t...
متن کاملO-24: Single Oocyte Secretoma Mapping by NMR-Metabolomics Technology: A Non-Invasive Strategy to Select The Best Oocytes to Fertilize Avoiding Supernumerary Embryos and Increasing Take- Home-Baby-Rate after IVF
Background Current strategies based on random selection of MII-oocytes to fertilize appear unsatisfactory in selecting the best number and the most vital oocytes to fertilize especially in poor responder women in which “chronological age” does not mismatch with “biological age”. The metabolomics- profiling approach, evaluating the final products of cell regulatory process (genome/transcriptome/...
متن کاملA Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems
Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...
متن کامل